Goto

Collaborating Authors

 private information retrieval


PIR-RAG: A System for Private Information Retrieval in Retrieval-Augmented Generation

Wang, Baiqiang, Lou, Qian, Zheng, Mengxin, Zhao, Dongfang

arXiv.org Artificial Intelligence

Retrieval-Augmented Generation (RAG) has become a foundational component of modern AI systems, yet it introduces significant privacy risks by exposing user queries to service providers. To address this, we introduce PIR-RAG, a practical system for privacy-preserving RAG. PIR-RAG employs a novel architecture that uses coarse-grained semantic clustering to prune the search space, combined with a fast, lattice-based Private Information Retrieval (PIR) protocol. This design allows for the efficient retrieval of entire document clusters, uniquely optimizing for the end-to-end RAG workflow where full document content is required. Our comprehensive evaluation against strong baseline architectures, including graph-based PIR and Tiptoe-style private scoring, demonstrates PIR-RAG's scalability and its superior performance in terms of "RAG-Ready Latency"-the true end-to-end time required to securely fetch content for an LLM. Our work establishes PIR-RAG as a viable and highly efficient solution for privacy in large-scale AI systems.


What If, But Privately: Private Counterfactual Retrieval

Meel, Shreya, Nomeir, Mohamed, Dissanayake, Pasan, Dutta, Sanghamitra, Ulukus, Sennur

arXiv.org Artificial Intelligence

Transparency and explainability are two important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of catering this requirement. However, this also poses a threat to the privacy of the institution that is providing the explanation, as well as the user who is requesting it. In this work, we are primarily concerned with the user's privacy who wants to retrieve a counterfactual instance, without revealing their feature vector to the institution. Our framework retrieves the exact nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect, information-theoretic, privacy for the user. First, we introduce the problem of private counterfactual retrieval (PCR) and propose a baseline PCR scheme that keeps the user's feature vector information-theoretically private from the institution. Building on this, we propose two other schemes that reduce the amount of information leaked about the institution database to the user, compared to the baseline scheme. Second, we relax the assumption of mutability of all features, and consider the setting of immutable PCR (I-PCR). Here, the user retrieves the nearest counterfactual without altering a private subset of their features, which constitutes the immutable set, while keeping their feature vector and immutable set private from the institution. For this, we propose two schemes that preserve the user's privacy information-theoretically, but ensure varying degrees of database privacy. Third, we extend our PCR and I-PCR schemes to incorporate user's preference on transforming their attributes, so that a more actionable explanation can be received. Finally, we present numerical results to support our theoretical findings, and compare the database leakage of the proposed schemes.


Federated One-Shot Learning with Data Privacy and Objective-Hiding

Egger, Maximilian, Urbanke, Rüdiger, Bitar, Rawad

arXiv.org Machine Learning

--Privacy in federated learning is crucial, encompassing two key aspects: safeguarding the privacy of clients' data and maintaining the privacy of the federator's objective from the clients. While the first aspect has been extensively studied, the second has received much less attention. We present a novel approach that addresses both concerns simultaneously, drawing inspiration from techniques in knowledge distillation and private information retrieval to provide strong information-theoretic privacy guarantees. Traditional private function computation methods could be used here; however, they are typically limited to linear or polynomial functions. T o overcome these constraints, our approach unfolds in three stages. In stage 0, clients perform the necessary computations locally. In stage 1, these results are shared among the clients, and in stage 2, the federator retrieves its desired objective without compromising the privacy of the clients' data. The crux of the method is a carefully designed protocol that combines secret-sharing-based multi-party computation and a graph-based private information retrieval scheme. We show that our method outperforms existing tools from the literature when properly adapted to this setting. We consider federated learning (FL), a framework where a federator and a set of clients with private data collaborate to train a neural network. Due to privacy constraints, the clients' data cannot be directly shared with the federator or among the clients. This privacy concern has been extensively studied in the literature [2]-[6]. There exists a second, often overlooked, privacy concern: ensuring the privacy of the federator's objective used to train the neural network. This aspect has not been explored in the literature to the same extent. We present a novel approach that ensures the privacy of the clients' data and simultaneously hides the objective of the federator through a careful combination of a secure aggregation method and a tailored private information retrieval (PIR) scheme. This project is funded by DFG (German Research Foundation) projects under Grant Agreement Nos. Part of the work was done when RB and ME visited RU at EPFL supported in parts by EuroTech Visiting Researcher Programme grants.


Private Counterfactual Retrieval With Immutable Features

Meel, Shreya, Dissanayake, Pasan, Nomeir, Mohamed, Dutta, Sanghamitra, Ulukus, Sennur

arXiv.org Artificial Intelligence

In a classification task, counterfactual explanations provide the minimum change needed for an input to be classified into a favorable class. We consider the problem of privately retrieving the exact closest counterfactual from a database of accepted samples while enforcing that certain features of the input sample cannot be changed, i.e., they are \emph{immutable}. An applicant (user) whose feature vector is rejected by a machine learning model wants to retrieve the sample closest to them in the database without altering a private subset of their features, which constitutes the immutable set. While doing this, the user should keep their feature vector, immutable set and the resulting counterfactual index information-theoretically private from the institution. We refer to this as immutable private counterfactual retrieval (I-PCR) problem which generalizes PCR to a more practical setting. In this paper, we propose two I-PCR schemes by leveraging techniques from private information retrieval (PIR) and characterize their communication costs. Further, we quantify the information that the user learns about the database and compare it for the proposed schemes.


Private Counterfactual Retrieval

Nomeir, Mohamed, Dissanayake, Pasan, Meel, Shreya, Dutta, Sanghamitra, Ulukus, Sennur

arXiv.org Artificial Intelligence

Transparency and explainability are two extremely important aspects to be considered when employing black-box machine learning models in high-stake applications. Providing counterfactual explanations is one way of catering this requirement. However, this also poses a threat to the privacy of both the institution that is providing the explanation as well as the user who is requesting it. In this work, we propose multiple schemes inspired by private information retrieval (PIR) techniques which ensure the \emph{user's privacy} when retrieving counterfactual explanations. We present a scheme which retrieves the \emph{exact} nearest neighbor counterfactual explanation from a database of accepted points while achieving perfect (information-theoretic) privacy for the user. While the scheme achieves perfect privacy for the user, some leakage on the database is inevitable which we quantify using a mutual information based metric. Furthermore, we propose strategies to reduce this leakage to achieve an advanced degree of database privacy. We extend these schemes to incorporate user's preference on transforming their attributes, so that a more actionable explanation can be received. Since our schemes rely on finite field arithmetic, we empirically validate our schemes on real datasets to understand the trade-off between the accuracy and the finite field sizes.


A faster way to preserve privacy online

#artificialintelligence

Searching the internet can reveal information a user would rather keep private. For instance, when someone looks up medical symptoms online, they could reveal their health conditions to Google, an online medical database like WebMD, and perhaps hundreds of these companies' advertisers and business partners. For decades, researchers have been crafting techniques that enable users to search for and retrieve information from a database privately, but these methods remain too slow to be effectively used in practice. MIT researchers have now developed a scheme for private information retrieval that is about 30 times faster than other comparable methods. Their technique enables a user to search an online database without revealing their query to the server.


Technique protects privacy when making online recommendations

#artificialintelligence

Algorithms recommend products while we shop online or suggest songs we might like as we listen to music on streaming apps. These algorithms work by using personal information like our past purchases and browsing history to generate tailored recommendations. The sensitive nature of such data makes preserving privacy extremely important, but existing methods for solving this problem rely on heavy cryptographic tools requiring enormous amounts of computation and bandwidth. MIT researchers may have a better solution. They developed a privacy-preserving protocol that is so efficient it can run on a smartphone over a very slow network.